Introduction

This document records the construction of the Index, in particular the process from the denominated input data to the initial index results. This follows on from the “Indicator Analysis”, in which indicators were analysed and eventually screened/selected.

The steps followed here will be:

  • Outlier treatment
  • Normalisation
  • Aggregation

We begin by building the Index coin following the same steps as in the “indicator analysis”. For convenience these steps have now been condensed down to a dedicated function:

With the coin in hand we can now proceed to the construction steps.

Outlier treatment

Outlier treatment aims to adjust the distributions of highly skewed, or fat tailed indicators, including cases where there are outliers that are not characteristic of the rest of the distribution. This is done to improve the discriminatory power of the indicator in aggregation. For more on this, see here.

As discovered in the previous document, a number of indicators require data treatment. To deal with this we follow a standard procedure which looks like this, for each indicator:

  1. Check the skew and kurtosis of the indicator
  2. If the absolute skew is greater than 2 AND the kurtosis is greater than 3.5, data treatment is applied (step 3 onwards), else leave the indicator as it is and move back to 1 for the next indicator.
  3. Apply Winsorisation up to a maximum number of five points. Check after each Winsorised point whether skew and kurtosis fall back within the limits specified. If so, apply no further data treatment and move to the next indicator.
  4. If the maximum number of Winsorised points is reached and skew and kurtosis are still outside the thresholds, “undo” any Winsorised points and apply a log transformation.

This process is built into COINr as a default, so we can apply it easily:

Let’s check the skew/kurtosis stats.

#>     iCode Pass_0 Skew_0 Kurt_0 Treatment Pass_1 Skew_1 Kurt_1
#> 1   A.M.1  FALSE   5.04  27.89       Log   TRUE   1.93   2.73
#> 2   A.M.2  FALSE   2.14   5.82    Win: 2   TRUE   1.94   4.28
#> 3   A.M.4  FALSE   4.10  24.25       Log   TRUE   0.57  -0.23
#> 4   S.A.4  FALSE   9.12 101.17       Log  FALSE   2.54   7.44
#> 5   S.E.2  FALSE   2.39   8.28    Win: 3   TRUE   1.89   4.27
#> 6   S.G.4  FALSE   3.71  17.98       Log   TRUE   0.22  -0.13
#> 7   S.G.5  FALSE   8.70 108.31       Log   TRUE   1.03   1.09
#> 8   S.G.6  FALSE   6.25  46.51       Log  FALSE   3.59  12.17
#> 9   S.G.7  FALSE  10.47 141.69       Log   TRUE   1.06   4.69
#> 10  C.E.4  FALSE   9.10  95.10       Log   TRUE   0.71  15.67
#> 11  C.E.6  FALSE   2.59   7.90       Log   TRUE   0.03   0.23
#> 12  C.E.9  FALSE   4.82  39.79    Win: 2   TRUE   1.81   5.57
#> 13 C.E.11  FALSE   2.45   8.26    Win: 2   TRUE   1.86   3.54
#> 14  C.I.5  FALSE   5.23  44.51       Log   TRUE   0.50   1.97
#> 15  C.I.6  FALSE  -2.20   5.17       Log  FALSE  -6.07  52.33
#> 16  C.J.2  FALSE  16.66 293.71       Log  FALSE   4.13  23.17
#> 17  C.S.4  FALSE   5.05  32.43       Log   TRUE   1.19   2.09

We can see that most indicators have been dealt with by applying a log transformation as expected, whereas a few have been Winsorised. In total, after treatment four indicators still fall outside the skew/kurtosis limits. We will check these visually:

This shows a problem: that one of the indicators is unusually negatively skewed. In this case, applying a log transformation won’t work because that corrects for positive skew. To deal with this I have encoded a function in COINr which can deal with negative skew as well, and this is invoked here. In fact, it checks the direction of skew and applies the correct transformation.

Now let’s check the outcome. We just focus on “C.I.6” here which is the problematic indicator:

This demonstrates the effectiveness of the new transformation: it has normalised the indicator but retaining its ordering. The scale of the indicator is now different (as with all transformations) but this is not important since indicators will anyway be scaled between 0-100 in the following step, and the scaling and transformation is only for the purposes of aggregation. When presenting individual indicators, we will of course present the real data.

Normalise

Following this we can normalise the indicators using a standard min-max approach. This scales each indicator onto the \([0,100]\) interval.

Aggregate

Now we create aggregate levels by aggregating up to the index. We recall that this aggregates by using the weighted arithmetic average of the normalised scores. Weights have been defined in the input file (input metadata) and are currently set as all equal. We will allow weight adjustment in a later step, but for now we aggregate using the default approach.

This has created all the aggregate scores: categoria scores, dimension scores, and the MVI scores themselves.

Results

Our first view of the results is as a results table. The table is sorted by default from the highest scoring (most vulnerable) municipalities downwards, based on the Index scores.

These results should be checked to see whether they agree with common sense. Another way of looking at the results is in a bar chart. Here, since we have a lot of municipalities I will just plot the top thirty. They are coloured by departamento.

We can plot the same chart but broken down by Dimension scores - this can give a view of how much each dimension contributes to the total score.

As a last view of the results (for the moment), we can plot a choropleth map. This is based on the municipal shape files.

#> OGR data source with driver: ESRI Shapefile 
#> Source: "/home/edouard/R-projects/Americas_project/MVI_Guatemala/inst/shp/gtm_admbnda_adm2_ocha_conred_20190207.shp", layer: "gtm_admbnda_adm2_ocha_conred_20190207"
#> with 342 features
#> It has 14 fields

Conclusions

The next steps from here are probably:

  • The “reality check” of the results: do they make sense to you as experts in the field?
  • Have a think about each step of the methodology: should we do anything differently? We can anyway always try alternative approaches and compare the differences.
  • Check again the resulting structure of the index: are there any big gaps in terms of things measured? Any reshuffles of indicators/categorias still needed?

The aim being to be fairly sure, before proceeding, that the core methodology is sound and the results are realistic. I would then “finalise” the indicator analysis and index construction documents and tidy up figures etc.

After that, we can move to the next phases. Namely, I would begin to build the “modules” for the steps of the index construction. Some of the code written here can be used to some extent. We will also need a weight adjustment function. Then the code can be packaged more cleanly (it can even be a small R package for convenience) and documented.